An OLAP-based Scalable Web Access Analysis Engine

نویسندگان

  • Qiming Chen
  • Umeshwar Dayal
  • Meichun Hsu
چکیده

Collecting and mining web log records (WLRs) from e-commerce web sites has become increasingly important for targeted marketing, promotions, and traffic analysis. In this paper, we describe a scalable data warehousing and OLAP-based engine for analyzing WLRs. We have to address several scalability and performance challenges in developing such a framework. Because an active web site may generate hundreds of millions of WLRs daily, we have to deal with huge data volumes and data flow rates. To support fine-grained analysis, e.g., individual users’ access profiles, we end up with huge, sparse data cubes defined over very large-sized dimensions (there may be hundreds of thousands of visitors to the site and tens of thousands of pages). While OLAP servers store sparse cubes quite efficiently, rolling up a very large cube can take prohibitively long. We have applied several nontraditional approaches to deal with this problem, which allow us to speed up WLR analysis by 3 orders of magnitude. Our framework supports multilevel and multidimensional pattern extraction, analysis and feature ranking, and in addition to the typical OLAP operations, supports data mining operations such as extended multilevel and multidimensional association rules.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid Cloud Support for Large Scale Analytics and Web Processing

Platform-as-a-service (PaaS) systems, such as Google App Engine (GAE), simplify web application development and cloud deployment by providing developers with complete software stacks: runtime systems and scalable services accessible from well-defined APIs. Extant PaaS offerings are designed and specialized to support large numbers of concurrently executing web applications (multi-tier programs ...

متن کامل

A Data-Warehouse/OLAP Framework for Scalable Telecommunication Tandem Traffic Analysis

In a telecommunication network, hundreds of millions of call detail records (CDRs) are generated daily. Applications such as tandem traffic analysis require the collection and mining of CDRs on a continuous basis. The data volumes and data flow rates pose serious scalability and performance challenges. This has motivated us to develop a scalable datawarehouse/OLAP framework, and based on this f...

متن کامل

Discovering and Mining User Web-page Traversal Patterns

As the popularity of WWW explodes, a massive amount of data is gathered by Web servers in the form of Web access logs. This is a rich source of information for understanding Web user surfing behavior. Web Usage Mining, also known as Web Log Mining, is an application of data mining algorithms to Web access logs to find trends and regularities in Web users' traversal patterns. The results of Web ...

متن کامل

OLAP-based Scalable Profiling of Customer Behavior

Profiling customers’ behavior has become increasingly important for many applications such as fraud detection, targeted marketing and promotion. Customer behavior profiles are created from very large collections of transaction data. This has motivated us to develop a data-warehouse and OLAP based, scalable and flexible profiling engine. We define profiles by probability distributions, and compu...

متن کامل

OLAP For Multicriteria Maintenance Scheduling

Widely used in decision support systems, OLAP (Online Analytical Processing) technology facilitates interactive analysis of multi-dimensional data of varied granularities. In this paper, we demonstrate an interesting application of OLAP in solving multicriteria maintenance scheduling problems. Maintenance scheduling has many important applications, such as maintenance of inverted indexes for se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000